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i—i ; 1 Abstract 

^SJ ' In this manuscript I review the mathematics and physics that underpins recent 

work using the clustering of galaxies to derive cosmological model constraints. 
I start by describing the basic concepts, and gradually move on to some of the 
complexities involved in analysing galaxy redshift surveys, focusing on the 2dF 
Galaxy Redshift Survey (2dFGRS) and the Sloan Digital Sky survey (SDSS). 
I/"") , Difficulties within such an analysis, particularly dealing with redshift space 

distortions and galaxy bias are highlighted. I then describe current observa- 
tions of the CMB fluctuation power spectrum, and consider the importance 
of measurements of the clustering of galaxies in light of recent experiments. 
Finally, I provide an example joint analysis of the latest CMB and large-scale 
' structure data, leading to a set of parameter constraints. 
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2 introduction 



The basic techniques required to analyse galaxy clustering were introduced 
in the 70s 0H|, and have been subsequently refined to match data sets of in- 
■ creasing quality and size. In this manuscript I have tried to summarise the 

current state of this field. Obviously, such an attempt can never be complete 
or unique in every detail, although it is still worthwhile as it is always useful 
to have more than one source of information. An excellent alternative view- 
point was recently provided by Hamilton [251 126) . which covers some of the 
same material, and provides a more detailed review of some of the statistical 
methods that are used. Additionally it is worth directing the interested reader 
to a number of good text books that cover this topic ^fl EH1 E| • In addi- 
tion to a description of the basic mathematics and physics behind a clustering 
analysis I have attempted to provide a discussion of some of the fundamental 
and practical difficulties involved. The cosmological goal of such an analysis 
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is consider in the final part of this manuscript, where the combination of cos- 
mological constraints from galaxy clustering and the CMB is discussed, and 
an example multi-parameter fit to recent data is considered. 



3 Basics 

Our first step is to define the dimensionless overdensity 

*(*) = (1) 
P 

where p is the expected mean density, which is independent of position because 
of statistical homogeneity. 

The autocorrelation function of the overdensity field (usually just referred 
to as the correlation function) is defined as 

f(x ll x a ) = (f(x 1 )f(x a )}. (2) 

From statistical homogeneity and isotropy, we have that 

£(xi,x 2 ) = £(xi - x 2 ), (3) 
= C(|x 1 -x 2 |). (4) 

To help to understand the correlation function, suppose that we have two small 
regions SVi and 6V2 separated by a distance r. Then the expected number of 
pairs of galaxies with one galaxy in SV\ and the other in dV2 is given by 

(vir> = « 2 [i + eW]Wy 2 , (5) 

where n is the mean number of galaxies per unit volume. We see that £(r) 
measures the excess clustering of galaxies at a separation r. If £(r) = 0, the 
galaxies are unclustered (randomly distributed) on this scale - the number 
of pairs is just the expected number of galaxies in 5V\ times the expected 
number in JV2. £(r) > corresponds to strong clustering, and £(r) < to 
anti-clustering. Estimation of £(r) from a sample of galaxies will be discussed 
in Section ETT1 

It is often convenient to consider perturbations in Fourier space. In cos- 
mology the following Fourier transform convention is most commonly used 



S(k) = j 8{v)e tk r d 6 r (6) 

m = j w^ 1 "!^- (?) 



The power spectrum is defined as 
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(2n) 



P(k 1; k 2 ) = — 3 <*(ki)*(k a )> ■ (8) 



Statistical homogeneity and isotropy gives that 

P(k!,k 2 ) =<5rj(ki -k 2 )P(fc 1 ), (9) 

where Sd is the Dirac delta function. The power spectrum is sometimes pre- 
sented in dimensionless form 

A 2 (k) = ^P(k). (10) 
The correlation function and power spectrum form a Fourier pair 

P[k) = J £(r)e ikr d 3 r (11) 

/rl 3 k 
P{k)e ^ r j2^' (12) 

so they provide the same information. The choice of which to use is therefore 
somewhat arbitrary (see for a further discussion of this). 

The extension of the 2-pt statistics, the power spectrum and the correlation 
function, to higher orders is straightforward with Eq. becoming 

(ntupic) = n n [l + 5V X --- SV n . (13) 

However, the central limit theorem implies that a density distribution is 
asymptotically Gaussian in the limit where the density results from the aver- 
age of many independent processes. The overdensity field has zero mean by 
definition, so is completely characterised by either the correlation function 
or the power spectrum. Consequently, in this regime, measuring either the 
correlation function or the power spectrum provides a statistically complete 
description of the field. 



4 matter perturbations 

There are three physical stages in the creation and evolution of perturbations 
in the matter distribution. First, primordial perturbation are produced in an 
inflationary epoch. Second, the different forms of matter within the Universe 
affect these primordial perturbations. Third, gravitational collapse leads to 
the growth of these fluctuations. In this section we will discuss the form of 
the perturbations on scales where gravitational collapse can be described by 
a linear change in the overdensity. The gravitational collapse of perturbations 
will be considered in Sectional 
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Fig. 1. Plots showing the linear power spectrum (solid lines) for a variety of different 
cosmological parameters. Only the shapes of the power spectra are compared, and 
the amplitudes are matched to the same large scale value. Our base model has 
Qmh = 0.2, n s = 1, Qi,/(2m = and fi v /QM = 0. Deviations from this base 
model are given in each panel. As can be seen many of the shape distortions from 
changing different parameters are similar, which can cause degeneracies between 
these parameters when fitting models to observations. 



4.1 why are there matter perturbations? 

A period of "faster than light" expansion in the very early Universe solves a 
number of problems with standard cosmology. In particular, it allows distant 
regions that appear causally disconnected to have been connected in the past 
and therefore explains the flatness of the CMB. Additionally it drives the 
energy density of the Universe close to the critical value and, most importantly 
for our discussion of perturbations, it provides a mechanism for producing seed 
perturbations as quantum fluctuations in the matter density are increased to 



Cosmological constraints from galaxy clustering 5 



significant levels. For a detailed examination of the creation of fluctuations see 
For now, we will just comment that the most basic inflationary models 
give a spectrum of fluctuations P(k) oc fe" with n ~ 1. 

4.2 the effect of dark matter 

The growth of dark matter fluctuations is intimately linked to the Jeans scale. 
Perturbations smaller than the Jeans scale do not collapse due to pressure 
support - for collision-less dark matter this is support from internal random 
velocities. Perturbations larger than the Jeans scale grow through gravity at 
the same rate, independent of scale. In a Universe with just dark matter and 
radiation, the Jeans scale grows to the size of the horizon at matter-radiation 
equality, and then reduces to zero when the matter dominates. We therefore 
sec that the horizon scale at matter-radiation equality will be imprinted in the 
distribution of fluctuations - this scale marks a turn-over in the growth rate 
of fluctuations. What this means in practice is that there is a cut-off in the 
power spectrum on small scales, dependent on ^?m^, with a stronger cut-off 
predicted for lower values. This is demonstrated in Fig. 2] 

4.3 the effect of baryons 

At early epochs baryons are coupled to the photons and, if we consider a 
single fluctuation, a spherical shell of gas and photons is driven away from the 
perturbation by a sound wave. When the photons and gas decouple, a spherical 
shell of baryons is left around a central concentration of dark matter. As the 
perturbation evolves through gravity, the density profiles of the baryons and 
dark matter grow together, and the perturbation is left with a small increase 
in density at a location corresponding to the sound horizon at the end of the 
Compton drag epoch |2J|3]. This real-space "shell" is equivalent to oscillations 
in the power spectrum. In addition to these acoustic oscillations, fluctuations 
smaller than the Jeans scale, which tracks the sound horizon until decoupling, 
do not grow, while large fluctuations are unaffected and continue to grow. 
The presence of baryons therefore also leads to a reduction in the amplitude 
of small scale fluctuations. For more information and fitting formulae for the 
different processes a good starting point is |17| . 

4.4 the effect of neutrinos 

The same principal of gravitational collapse versus pressure support can be 
applied in the case of massive neutrinos. Initially the neutrinos are relativistic 
and their Jeans scale grows with the horizon. As their temperature decreases 
their momenta drop, they become non-relativistic, and the Jeans scale de- 
creases - they can subsequently fall into perturbations. Massive neutrinos are 
interesting because even at low redshifts the Jeans scale is cosmologically rel- 
evant. Consequently the linear power spectrum (the fluctuation distribution 
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excluding the non-linear collapse of perturbations) is not frozen shortly after 
matter-radiation equality. Instead its form is still changing at low redshifts. 
Additionally, the growth rate depends on the scale - it is suppressed until 
neutrinos collapse into perturbations, simply because the perturbations have 
lower amplitude. The effect of neutrino mass on the present day linear power 
spectrum is shown in Fig. Note that in this plot the relative amplitudes of 
the power spectra have been removed - it is just the shape that is compared. 
The amplitude would also depend on the combined neutrino mass. 



5 the evolution of perturbations 



Having discussed the form of the linear perturbations, we will now consider 
how perturbations evolve through gravity in the matter and dark energy dom- 
inated regimes. To do this, we will use the spherical top-hat collapse model, 
where we compare a sphere of background material with radius a, with one 
of radius a p which contains the same mass, but has a homogeneous change in 
overdensity. The ease with which the behaviour can be modeled follows from 
Birkhoff's theorem, which states that a spherically symmetric gravitational 
field in empty space is static and is always described by the Schwarzchild 
metric |Sj . This gives that the behaviour of the homogeneous sphere of uni- 
form density and the background can be modeled using the same equations. 
For simplicity we initially only consider the sphere of background material. 

The sphere of background material behaves according to the standard 
Friedmann and cosmology equations 



1 f da 



a 2 \dH t 



= Q 



M a 



Q K a- 2 + Q x a f{a \ 



(14) 



Hi 



flu a 3 



[1 +3w(a)]J? x a /(a) 



(15) 



1 d 2 a 

a~d¥ ~ 2 

These equations have been written in a form allowing for a general time- 
dependent equation of state for the dark energy p = w(a)p. Conservation of 
energy for the dark energy component provides the form of /(a) 



f(a) = z — / [l + w(a')]dlna'. (16) 
In a J Q 

The dark matter and dark energy densities evolve according to 

Mm [a) = „ 2t z , O x (a) = Tp2 . , . (17) 



E 2 (a) 



E 2 (a) 



Tracks showing the evolution of Qm{o) and Qx{a) are presented in Fig. [5] for 
h = 0.7 and constant dark energy equation of state w = — 1. Of particular 
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Fig. 2. Plot showing the evolution of the matter and vacuum energy densities for 
a selection of cosmologies (grey lines) with constant dark energy equation of state 
parameter w = — 1. The critical models that border the different types of evolution 
are shown by the black lines. The dotted line highlights fix = 0. 

interest are solutions which predict recollapse, but that have fix > 0. Pro- 
vided that I?m >> fix, the perturbation will collapse before the dark energy 
dominates. For a cosmology with Qm ~ 0.3 and fix ~ 0.7, these solutions 
correspond to overdense spheres that will collapse and form structure. 
For the perturbation, the cosmology equation can be written 



where it is worth noting that the dark energy component is dependent on 
a rather than a p . This does not matter for /l-cosmologies as f(a) = 0, and 
the a dependence in this term is removed. For other dark energy models, 
this dependence follows if the dark energy does not cluster on the scales of 
interest. For such cosmological models, we cannot write down a Friedmann 
equation for the perturbation because energy is not conserved [HSj- We also 
have to be more careful using virialisation arguments to analyse the behaviour 
of perturbations |T7] , 

To first order, the overdensity of the perturbation S = a 3 /ap — 1 evolves 
according to 



1 cPdj 
a v dt 2 



p 




2 I 



fl M a~ 3 + [l + 3w(a)}fl x a na) 



(18) 
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d(H t) 2 

which is known as the linear growth equation 



(19) 




time / Gyr 

Fig. 3. Plot showing the evolution of the scale factor of perturbations with different 
initial overdensities. A standard cosmology with Qm = 0.3, Qx = 0.7, h = 0.7, w — 
— 1 is assumed. The dashed lines show the linear extrapolation of the perturbation 
scales for the two least overdense perturbations. 



The evolution of the scale factor of the perturbations is given by the solid 
lines in Fig. |3| compared with the background evolution for a cosmology with 
(2 m = 0.3, fix = 0.7, h = 0.6, w = —1. These data were calculated by nu- 
merically solving Eg. 1181 For comparison, the dashed lines were calculated by 
extrapolating the initial perturbation scales using the linear growth factor, 
calculated from Eq. Dashed lines are only plotted for the two least over- 
dense perturbations. In comparison, the most overdense perturbations are 
predicted to collapse to singularities. However, in practice inhomogeneities, 
and the non-circular shape of actual perturbations will mean that the object 
virialises with finite extent. 

The evolution of perturbations has a profound affect on the present day 
power spectrum of the matter fluctuations on small scales. On the largest 
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scales, the overdensities are small and linear theory (Eq. I19fl holds. This in- 
creases the amplitude of the fluctuations, but does not change the shape of the 
power spectrum, as the perturbation all grow at the same rate (except if neu- 
trinos are cosmologically relevant - see Section^3J- However, on the smallest 
scales, overdensities are large and collapse to virialised structures (e.g. cluster 
of galaxies). The effect on the power spectrum is most easily quantified us- 
ing numerical simulations, and power spectra calculated from fitting formulae 
derived from such simulations |5f>| are plotted in Fig. 0] 
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Fig. 4. Plots comparing non-linear (solid lines) and linear power spectra (dotted 
lines) at a series of redshifts from z = to z = 5. In the left panel the raw di- 
mensionless power spectra are plotted while in the right panel the ratio between 
non-linear and linear predictions is shown. As can be seen, on large scales linear 
growth simply increases the amplitude of the power spectrum, while on small scales 
we also see an increase in power as structures collapse at low redshifts. There is also 
a slight decrease in power on intermediate scales - it is this power that is transferred 
to small scales. Non-linear power spectra were calculated from the fitting formulae 
of EH] with Q M = 0.3, h = 0.7, n s = 1, and Q h jQ m = 0.15. 



6 galaxy survey analysis 

6.1 estimating the correlation function 

First suppose that we have a single population of objects forming a Poisson 
sampling of the field that we wish to constrain. This is too simple an assump- 
tion for the analysis of modern galaxy redshift surveys, but it will form a 
starting point for the development of the analysis tools required. 
First we define the (unweighted) galaxy density field 



i 



(20) 
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The definition of the correlation function then gives 

(n ff (r)n fl (rO) = n(v)n(r')[l + £(r - r')] + n(r)S D (v - r'). (21) 

The final term in this equation relates to the shot noise, and only occurs for 
zero separation so can be easily dealt with. 

In order to estimate the correlation function, we can consider a series of 
bins in galaxy separation and make use of Eq. 1211 Suppose that we have 
created a (much larger) random distribution of points that form a Poisson 
sampling of the volume occupied by the galaxies, then 

where DD is the number of galaxy-galaxy pairs within our bin in galaxy 
separation divided by the maximum possible number of galaxy-galaxy pairs 
(ie. for n galaxies the maximum number of distinct pairs is n(n — l)/2). 
Similarly RR is the normalised number of random-random pairs, and we can 
also define DR as the normalised number of galaxy-random pairs. 

If the true mean density of galaxies n(r) is estimated from the sample itself 
(as is almost always the case), we must include a factor (1 +£j?) that corrects 
for the systematic offset induced. £q is the mean of the two-point correlation 
function over the sampling geometry |23] . Given only a single clustered sample 
it is obviously difficult to determine £n, and the integral constraint (as it is 
known) remains a serious drawback to the determination of the correlation 
function from small samples of galaxies. 

Because the galaxy and random catalogues are uncorrelated, (DR) = 
(RR) , and we can consider a number of alternatives to Eq. 1221 In particu- 

has been shown to have good statistical properties |34| . 



6.2 estimating the power spectrum 

In this section we consider estimating the power spectrum by simply taking a 
Fourier transform of the overdensity field |SJ |^ 03] . As for our estimation of 
the correlation function, suppose that we have quantified the volume occupied 
by the galaxies by creating a large random catalogue matching the spatial 
distribution of the galaxies, but with no clustering (containing a times as 
many objects). The (unnormalised) overdensity field is 

F(t) = n 9 (r) - n r (r)/a, (24) 

where n g is given by Eq. 1201 and n r is similarly defined for the random cata- 
logue. 
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Taking the Fourier transform of this field, and calculating the power gives 

{ ii 3 k' 1 ( 

<l^(k)| 2 )- y ^)3[P(k')-P(0)fe(k)]|G(k-k')| 2 + (l + -) J d 3 rn(r), 

"" (25) 
where G(k) if the Fourier transform of the window function, defined by 

G(k) = J fi(r)e 4k r d 3 r, (26) 

and the final term in Eg. 1251 gives the shot noise. In contrast to the correla- 
tion function, there is a shot noise contribution at every scale. The integral 
constraint has reduced to subtracting a single Dirac delta function from the 
centre of the unconvolved power - as before this allows for the fact that we do 
not know the mean density of galaxies. 



6.3 complications 

There are two complications which constitute the main hindrance to using 
clustering in galaxy surveys to constrain cosmology. They are redshift space 
distortions - systematic deviations in measured redshift in addition to the 
Hubble flow, and galaxy bias - the fact that galaxies do not form a Poisson 
sampling of the underlying matter distribution. Denoting the measurement of 
a quantity in redshift space (galaxy distances calculated from redshifts) by a 
superscript s and in real space (true galaxy distances) by r , we can write the 
measured power spectrum Pg al as 

ps ps pr 

gal _ gal gal 

p pr p \ ' 

r mass gal mass 

The first of these terms corresponds to redshift space distortions, while the 
second corresponds to galaxy bias. 



redshift space distortions 

There are two key mechanisms that systematically distort galaxy redshifts 
from their Hubble flow values. First, structures are continually growing 
through gravity, and galaxies fall into larger structures. The infall velocity 
adds to the redshift, making the distance estimates using the Hubble flow 
wrong. This means that clusters of galaxies appear thinner along the line- 
of-sight, causing an increase in the measured power. In the distant observer 
approximation, the apparent amplitude of the linear density disturbance can 
be readily calculated leading to a change in the power corresponding to 



P| al = P g r aJ (l+^ 2 ) 2 , 



(28) 
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where f3 = J?"j 6 /&, b is an assumed linear bias for the galaxies, and [i is the 
cosine between the velocity vector and the line-of-sight. In the small angle 
approximation, we average over a uniform distribution for fi giving 



ps _ pr 
gal - 1 gal 



2 1 , 

1 + o P + a? 

A 5 



(29) 



For large redshift surveys of the nearby Universe, the small angle approxima- 
tion breaks down, although a linear result can be obtained using a spherical 
expansion of the survey (see Section l5~5)l . 

When objects collapse and virialise they attain a distribution with some 
velocity dispersion. These random velocities smear out the collapsed object 
along the line of sight in redshift space, leading to the existence of linear struc- 
tures pointing towards the observer. These structures, known as "fingers-of- 
god" can be corrected by matching with a group catalogue and applying a 
correction to the galaxy field before analysis .60 . Alternatively, if the pair- 
wise distribution of velocity differences is approximated by an exponential 
distribution, then 

^ g s ai = d(l + fcV^/2)-\ (30) 
where a p ~ 400 km s -1 is the pairwise velocity dispersion |28| . 



galaxy bias 

By the simple phrase "galaxy bias" astronomers quantify the "messy" as- 
trophysics of galaxy formation. It is common to assume a local linear bias 
with 5 ga i = 65 mass , which leads to a simple relation between power spectra 
Pggj = 6 2 P mass . If this bias is independent of the scale probed, then there is 
nothing to worry about - the galaxy and matter power spectra have the same 
shape. However, it is well known that galaxies of different types have different 
clustering strengths - two recent analyses are jSSl IM| ■ 

One simple way of understanding galaxy bias is to use the "halo model" , 
which has become popular over the last 5 years |541 1421 IT3*| . First, consider 
the distribution of the underlying matter - the power spectrum was shown in 
Fig. There are two distinct regimes: on large scales, linear growth holds, 
while on small scales the dark matter has formed into halos: it has either 
undergone collapse and has virialised, or is on the way to virialisation. Galax- 
ies pinpoint certain locations within the dark matter halos, according to an 
occupation distribution for each galaxy type. This forms a natural environ- 
ment in which to model galaxy bias, with galaxies of different luminosities 
and types have different occupation distributions depending on the physics of 
their formation. 

For 2-pt statistics, then there are two possibilities for pairs of galaxies. We 
could have chosen a pair where both galaxies lie in the same halo - this is 
most likely on small scales. Alternatively, the galaxies might be in different 
halos - this is most likely on large scales. On large scales, the halos themselves 
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are biased compared with the matter and we can use the peak-background 
split model [5J E01 1551 to estimate the increase in clustering strength. This 
limiting large scale value offers a route to determine the masses of the virialised 
structures in which particular galaxies live. 

Given a linear bias model for each type of galaxy in the sample to be 
analysed, it is possible to multiply the contribution of each galaxy to the esti- 
mate of the overdensity field by the inverse of an expected bias 05] ■ Provided 
the bias model is correct (and possibly altered for each scale observed), then 
this removes any systematic offset in the recovered power spectrum caused by 
galaxy bias. The problem is that we need to have an accurate model of the 
galaxy bias in order to remove it. 



6.4 weights 

The procedure described in Section 16.21 can be extended to include weights 
for each galaxy in order to optimise the analysis |21| . Under the assumptions 
that the wavelength of interest 2-7r/fc is small compared with the survey scale 
(i.e. the window is negligible), and that the fluctuations are Gaussian, then 
the optimal weight applied to galaxy i is 

(31) 



l + n( ri )P(fe) 

where n(r^) is the mean galaxy density at the location of galaxy i. At locations 
where the mean galaxy density is low, galaxies are weighted equally. Where 
the galaxy density is high, we weight by volume. It is worth noting that the 
optimal weights also depend on an estimate of the power spectrum to be 
measured, and therefore depend on the scale of interest. However, in practice 
this dependence is sufficiently weak that very little information is lost by 
assuming a constant P(k). 

It is possible to include galaxy bias when determining weights and opti- 
mising the analysis in order to recover the most signal. Given a bias for each 
galaxy bi (which can be dependent on any galaxy properties and the scale of 
interest), then the optimal weighting is |45j . 

b 2 

(32) 



1 + E,n(ri,6,)62p(fc) 



which up-weights the most biased galaxies that contain the strongest cosmo- 
logical signal. 



6.5 spherical bases 

In Section l6.2l we described the most simple analysis method for a 3-dimensional 
galaxy survey - decomposing into a 3D Fourier basis. However, as we discussed 
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in Section 16.31 redshift-space distortions complicate the situation, and cannot 
easily be dealt with using a Fourier basis. By decomposing into a basis that 
is separable in radial and angular directions, we can more easily correct such 
distortions. A pictorial comparison of the Fourier basis with a radial-angular 
separable basis is presented in Fig. [3] 




Fig. 5. Comparison of 3D Fourier basis split into 2D and ID components (right) 
with basis of Spherical Harmonics (with I = 2 and m = 0, 1 - top left) and Spherical 
Bessel functions (bottom left). 

In this section we provide an overview of a formalism to do this based on 
work by |29ll58lRl)| . For alternative formalisms see 12011101 EDI - I n comparison 
with the Fourier decomposition (Eq. [BJl, we decompose into a 3D basis of 
Spherical Harmonics Yi m and spherical Bessel functions ji 

S ( X ) = \[^J ^Si m {k)ji{kx)Yi m {e,cl))kdk. (33) 

Because of the choice of bases, the transformation 5i m {k) <-> k8{k) is unitary 
so we retain the bencht of working with the Fourier power spectrum 

{5 lm {k)5 Vm .{k')) = P{k)8 D {k - k')S D {l - l')S D (m - m'). (34) 

As in Section [{j. 21 we have simplified the analysis by not including any galaxy 
weights, although these can be introduced into the formalism. Additionally, it 
is easier to work with a fixed boundary condition - usually that fluctuations 
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vanish at some large radius so that we are only concerned with radial modes 
that have 

d 



, -ji (kx) 
ax 

so that the decomposition becomes 



0, (35) 



6(x) = Y, (k ln x)Y lm (6,0), (36) 

where q„ is a normalising constant. 

In order to analyse the transformed modes, we need a model for (Si mn 5i> m > n i ) 
First we deal with the survey volume by introducing a convolution 

<Wn = ^/mn" ^I'm'n'i (37) 

I'm'n' 

where 

rl'rn' 



MLn 



J d 3 xp(x)j l (k ln x) JV (k Vn ,x)Y* n (e^)Y Vm ,(e^). (38) 



We can include the effect of linear redshift space distortions by a transform 

jl(fonX S ) ~ jl(kl n X r ) + Ax\i n -^—jl(kl n X r ), (39) 

dx r 

where 

Ax iin =pY d ±-c ln 8 lmn djl {k ! n r Xr) Y lm (6, 0). (40) 

Kr ax 
Iran l,L 

Here (3 — S7®f Jb. The bias b corrects for the fact that while we measure the 
galaxy power spectrum, the redshift space distortions depend on the mass. 
We can also introduce a further convolution to correct for the small-scale 
fingers-of-god effect 

Sl'm'n' = J]] Sl'm'rJ 1 ^l"m"n" > (41) 
I" m" n" 

where 



Sl'm'J = c l'n'Ci"„"S l , l ,,S m , m „ J J p(r- y)jv (ki> n >r)ji» (k» n »y) rdry dy, 

(42) 

and p(r — y) is the 1-dimensional scattering probability for the velocity dis- 
persion. It is also possible to include bias and evolution corrections in the 
analysis method 

For a given cosmological model, we can use the above formalism to calcu- 
late the covariance matrix {5i mn 5v m 'n') for N modes, and then calculate the 
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Likelihood of a given cosmological model assuming that <5; 
distribution 



run 



has a Gaussian 



£[<S; mn | model 



(2tt) jv / 2 |C| 1 /2 



1 




mn 



(43) 



where C is the matrix of (5i mn 8ii m > n >). 
7 practicalities 

7.1 brief description of redshift surveys 

The 2dF Galaxy Redshift Survey (2dFGRS), which is now complete, covers 
approximately 1800 square degrees distributed between two broad strips, one 
across the South Galactic pole and the other close to the North Galactic Pole, 
plus a set of 99 random 2 degree fields spread over the full southern galactic 
cap. The final catalogue contains reliable redshifts for 221 414 galaxies selected 
to an extinction-corrected magnitude limit of approximately bj = 19.45 |12| . 

In contrast, the Sloan Digital Sky Survey (SDSS) is an ongoing photo- 
metric and spectroscopic survey. The SDSS includes two spectroscopic galaxy 
surveys: the main galaxy sample which is complete to a reddening-corrected 
Petrosian r magnitude brighter than 17.77, and a deeper sample of luminous 
red galaxy sample selected based on both colour and magnitude (IS] , The 
SDSS has regular public data releases: the 4th data release in 2005 included 
480000 independent galaxy spectra p. When completed, the SDSS will have 
obtained spectra for ~ 10 6 galaxies. 

7.2 angular mask 

Both the recent 2dF galaxy redshift (2dFGRS) and the ongoing Sloan Digital 
Sky Survey (SDSS) adopted an adaptive tiling system in order to target pho- 
tometrically selected galaxies for spectroscopic follow-up. The circular tiles 
within which spectra could be taken in a single pointing of the telescope were 
adaptively fitted over the survey region, with regions of high galaxy density 
being covered by two or more tiles. A region of such tiling is shown in Fig. El 
This procedure divides the survey into segments, each with a different com- 
pleteness - the ratio of good quality spectra to galaxies targeted. It is usually 
assumed that this completeness is uniform across each of the segments formed 
by overlapping tiles. Understanding this completeness is a major considera- 
tion when performing a large-scale structure analysis of either of these surveys. 
Note that the distribution of segments depends on all adjoining targeted tiles, 
not just those that have been observed. 

As well as understanding the completeness, we also need to consider the 
effect of the weather - spectra taken under bad observing conditions will tend 
to preferentially give redshifts for nearby rather than distant galaxies. We also 
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Fig. 6. Section in the SDSS DR4 angular mask showing the positions of galaxies 
with measured redshifts (black dots), the positions of the plates from which the 
spectra were obtained (large black circles) and the segments within the mask that 
have different completenesses (coloured regions). 

need to worry about bad fields - regions near bright stars where photometric 
data is of poor quality. For the SDSS, there are hard limits for the spectro- 
scopic region depending on how much photometric data was available when 
the targeting algorithm was run. All of these effects are well known and can 
be included in an analysis. 

7.3 radial distribution 

In addition to the angular distribution of galaxies, we also need to be able 
to model the radial distribution - in the formalism introduced in Section 16.21 
we need this information in order to create the random catalogue. Perhaps 
the best way of doing this is to model the true luminosity function of the 
distribution of observed galaxies, and then apply a magnitude cut-off. This 
was the procedure adopted in ^Hj. However, the reduction in the amplitude 
of the recovered power spectrum caused by fitting to the redshift distribution 
is small and it is common to simply fit a functional form to the distribution. 
In Fig. we present the distribution of galaxy redshifts in the SDSS DR4 
sample compared with a fit of the form 0] 
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Fig. 7. Redshift distribution of spectroscopically observed galaxies within the SDSS 
DR4 with apparent R magnitude less than 17.5 and 17.77 (solid circles). For com- 
parison we show the best fit model given by Eq. 1441 for each distribution (solid 
lines) . 



f(z) — z 9 exp 



(44) 



where g, b and z c are free parameters that have been fitted to the data. 



8 results from recent surveys 
8.1 results 

In Table ^ we summarise recent cosmological constraints derived from the 
2dFGRS and SDSS. In order to provide a fair test of different analyses, we 
have only presented best-fit parameters and errors for J?m/i, fixing the other 
important parameters. Degeneracies between parameters, caused by the sim- 
ilarity between power spectrum shapes shown in Fig.n me a n that, it is only 
the most recent analyses of the largest samples that can simultaneously con- 
strain 2 or more of these parameters. In Table^we also presented the number 
of galaxy redshifts used in each analysis. 
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Table 1. Summary of recent cosmological constraints from 2dFGRS and SDSS 
galaxy redshift surveys. To try to provide a fair comparison, we only present the best- 
fit value and quoted error for Quh assuming that all other cosmological parameters 
are fixed (n s = 1, h = 0.72, fl^/Qu = 0.17, Q v /Qm = 0.0), and marginalise over 



the normalisation. 


survey 


reference 


galaxy redshifts 


method 




2dFGRS 


P] 


166490 


Fourier 


0.206 ±0.023 


2dFGRS 


ma 


142756 


Spherical Harmonics 


0.215 ±0.035 


2dFGRS 


nni 


221414 


Fourier 


0.172 ±0.014 


SDSS 


pi 


205484 


KL analysis 


0.207 ±0.030 


SDSS 




205443 


Spherical Harmonics 


0.225 ± 0.040 


SDSS LRG 


m 


46748 


correlation function 


0.185 ±0.015 



The power spectra recovered from these analyses are compared in Fig. |SJ 
We have corrected each for survey window function effects using the best-fit 
model power spectrum. The amplitudes have also been matched, so this plot 
merely shows the shapes of the spectra. It is clear that the general shape of 
the galaxy power spectrum is now well known, and the turn-over is detected at 
high significance. The exact position of the turn-over is however, more poorly 
known and by examining the final column of Table ^ we see that there are 
discrepancies between recent analyses at the ~ 2a level. 

9 combination with CMB data 

In this section we consider recent CMB observations and see how the comple- 
mentarity between CMB and large scale structure constraints can break de- 
generacies inherent in these data. The major steps required in a joint analysis 
are described, leading up to Section 19.51 in which we present the constraints 
from an example fit to recent data. 

9.1 cosmological models 

Before we start looking at constraining cosmological models using CMB and 
galaxy P(k) data, it is worth briefly introducing the set of commonly used 
cosmological parameters (for further discussion see the recent review by |33j ) . 
It is standard to assume Gaussian, adiabatic fluctuations, and we will not 
discuss alternatives here. It is possible to parameterise the cosmological model 
using a number of related sets of parameters. It is vital in any analysis that the 
model that is being fitted to the data is fully specified - including parameters 
and assumed priors. Many parameters have values that simplify the theory 
from which the models are calculated (e.g. the assumption that the total 
density in the Universe is equal to the critical density). Whether the data 



20 



Will J. Percival 



0.02 



k / h Mpc 1 
0.05 



0.1 



m - 



• 2dFGRS 
o 2dFGRS 

★ 2dFGRS 

-U 2dFGRS 



- Cole et al. 2005 

- Percival et al. 12001 

- Percival 2004 ' 

- Tegmark et al: 2002 
Tegmark et al. 2004 




-1.5 -1 

lo gi0 k / h M PC _1 

Fig. 8. Plot comparing galaxy power spectra calculated by different analysis tech- 
niques for different surveys. The redshift-space power spectrum calculated by |10| 
(solid circles with 1-cr errors shown by the shaded region) are compared with other 
measurements of the 2dFGRS power spectrum shape by |43| - open circles, |46| - 
solid stars, )59l - open stars. Where appropriate the data have been corrected to 
remove effects of the survey volume, by calculating the effect on a model power spec- 
trum with Quh — 0.168, Qb/^M = 0.0, h — 0.72 & n s = 1. A zero-baryon model 
was chosen in order to avoid adding features into the power spectra. All of the data 
are renormalized to match the power spectrum of |10|. The open triangles show the 
uncorrelated SDSS real space P(k) estimate of 1501 . calculated using their 'modeling 
method' with no FOG compression (their Table 3). These data have been corrected 
for the SDSS window as described above for the 2dFGRS data. The solid line shows 
a model linear power spectrum with J?m/i = 0.168, fib/ S2m = 0.17, h = 0.72, n s = 1 
and normalization matched to the 2dFGRS power spectrum. 
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justify dropping one of these assumptions is an interesting Bayesian question 
|38| . which is outside the remit of the overview presented here, and we will 
simply introduce the parameters commonly used and possible assumptions 
about their values. 

First, we need to know the geometry of the Universe, parameterised by 
total energy density i?tot, or the curvature Hk, with the "simplified" value 
being that the energy density is equal to the critical value (i?tot = 1, = 0). 
We also need to know the constituents of the energy density, which we pa- 
ramcterisc by the dark matter density <?,, , baryon density O b . and neutrino 
density C2 U . Although it is commonly assumed that the combined neutri- 
nos mass has negligible cosmological effect. The combined matter density 
i?M = i7 c + .(?{, + i?„ could also be defined as a parameter, replacing one 
of the other density measurements. We also need to specify the dark en- 
ergy properties, particularly the equation of state w(a), which is commonly 
assumed to be constant w(a) = —1, so this field is equivalent to A. The per- 
turbations after inflation are specified by the scalar spectral index n s , with 
n s = 1 being the most simple assumption. Possible running of this spectral 
index is parameterised by a = dn s /dk if included. A possible tensor contri- 
bution parameterised by the tensor spectral index rit, and tensor-to-scalar 
ratio r is sometimes explicitly included. The evolution to present day is pa- 
rameterised by the Hubble constant h, and for the CMB the optical depth to 
last-scattering surface r. Finally, three parameters that are often ignored and 
marginalised over are the galaxy bias b(k) (often assumed to be constant) and 
the CMB beam B and calibration C errors. 

9.2 the MCMC technique 

Large multi-parameter likelihood calculations are computationally expensive 
using grid-based techniques. Consequently, the Markov-Chain Monte-Carlo 
(MCMC) technique is commonly used for such analyses. While there is publi- 
cally available code to calculate cosmological model constraints the basic 
method is extremely simple and relatively straightforward to code. 

The MCMC method provides a mechanism to generate a random sequence 
of parameter values whose distribution matches the posterior probability dis- 
tribution of a Bayesian analysis. Chains are sequentially calculated using the 
Metropolis algorithm 39 : given a chain at position x, a candidate point x' is 
chosen at random from a proposal distribution f(x'\x). This point is always 
accepted, and the chain moves to point x' , if the new position has a higher 
likelihood. If the new position x' is less likely than x, then x' is accepted, 
and the chain moves to point x' with probability given by the ratio of the 
likelihood of x' and the likelihood of a;. In the limit of an infinite number of 
steps, the chains will reach a converged distribution where the distribution of 
chain links arc representative of the likelihood hyper-surface, given any sym- 
metric proposal distribution f(x'\x) = f(x\x') (the Ergodic theorem: see, for 
example, [51]). 



22 Will J. Percival 



It is common to implement dynamic optimisation of the sampling of the 
likelihood surface (see for examples) . Again, it is simple to assume a multi- 
variate Gaussian proposal function, centered on the current chain position. 
Given such a proposal distribution, and an estimate of the covariance matrix 
for the likelihood surface at each step, the optimal approach for a Gaussian 
likelihood would proceed as follows. 

Along each principal direction corresponding to an eigenvector of the co- 
variance matrix, the variance a 1 of the multi-variate Gaussian proposal func- 
tion should be set to be a fixed multiple of the corresponding eigenvalue of 
the covariance matrix. To see the reasoning behind this, consider translating 
from the original 17 parameters to the set of parameters given by the decom- 
position along the principal directions of the covariance matrix each divided 
by the standard deviation in that direction. In this basis, the likelihood func- 
tion is isotropic and the parameters are uncorrelated. Clearly an optimized 
proposal function will be the same in each direction, and we have adjusted 
the proposal function to have precisely this property. There is just a single 
parameter left to optimize - we are free to multiply the width of the proposal 
function by a constant in all directions. But we know that the optimal frac- 
tion of candidate positions that are accepted should be ~ 0.25 [23], so we can 
adjust the normalization of the proposal width to give this acceptance frac- 
tion. Note that the dynamic changing of the proposal function width violates 
the symmetry of the proposal distribution f{x'\x) assumed in the Metropolis 
algorithm. However, this is not a problem if we only use sections of the chains 
where variations between estimates of the covariance matrix are small. 

The remaining issue is convergence - how do we know when we have suffi- 
ciently long chains that we have adequately sampled the posterior probability. 
A number of tests are available [23 : although it's always a good idea to 
perform a number of sanity checks as well - for example, do we get the same 
result from different chains started a widely separated locations in parameter 
space? 

9.3 introduction to the CMB 

Over the past few years there has been a dramatic improvement in the res- 
olution and accuracy of measurements of fluctuations in the temperature of 
the CMB radiation. The discovery of features, in particular, the first acous- 
tic peak, in the power spectrum of the CMB temperature has led to a new 
data-rich era in cosmology (7J [57j ■ More recently a significant leap forward 
was made with the release of the first year data from the WMAP satellite 
[HI EOj • The relative positions and heights of the acoustic peaks encode in- 
formation about the values of the fundamental cosmological parameters, as 
discussed for the matter power spectrum in Section^ For a flat cosmological 
model with n s = 1, Q M = 0.3, h = 0.7 and fl h h 2 = 0.02 the CMB and matter 
power spectra are compared in Fig. [5J In order to create Fig. [5J the angular 
CMB power spectrum was converted to comoving scales by considering the 



Cosmological constraints from galaxy clustering 23 

Q =0.3, n =0.7, h = 0.7, [2,h 2 =0.02 
m v b 



o 
o 
o 




k / h Mpe 1 

Fig. 9. Plot comparing large scale structure (lower panel) and CMB (upper panel) 
power spectra. The angular CMB power spectrum was converted to comoving scales 
using the comoving distance to the last scattering surface. The matter power spec- 
trum (solid - linear, dashed - non-linear, present day) , has been ratioed to a smooth 
model with zero baryons in order to highlight the baryonic features. Dotted lines 
show the positions of the peaks in the CMB spectrum. 

comoving scale of the fluctuations at the last scattering surface. In Fig. EI the 
matter power spectrum has been ratioed to a smooth zero baryon model in in 
order to highlight features - even so, the baryon oscillations are significantly 
more visible in the CMB fluctuation spectrum. The vertical dotted lines in 
this plot are located at the peaks in the CMB spectrum and highlight the 
phase offset between the two spectra. The CMB peaks are tt/2 out of phase 
with the matter peaks because they occur where the velocity is maximum, 
rather than the density at the last scattering surface - this is known as the 
velocity overshoot. Additionally there is a projection effect - the observed 
CMB spectrum is the 2D projection of 3D fluctuations, and so is convolved 
with an asymmetric function: the projection can increase, but not decrease 
the wavelength of a given fluctuation. 

A compilation of recent CMB data is presented in Fig. ^3 Here we have 
plotted both the temperature-temperature (TT) auto-power spectrum and 
the temperature-E-mode polarisation (TE) cross-power spectrum. The most 
significant current data set is, of course, the WMAP data shown by the solid 
circles in this figure. However, additional information is provided on small 
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multipole 

Fig. 10. Upper panel: The 1-yr WMAP TT power spectrum (black circles) is plotted 
with the CBI (red triangles), VSA (green squares) and ACBAR (blue stars) data at 
higher I. Lower panel: The 1-yr WMAP TE power spectrum (black circles). In both 
panels the solid black line shows the best fit model calculated from fitting the CMB 
data. 

scales by a number of other experiments. In Fig. 1101 we plot data from the 
CBI E2J, VSA H3|, and ACBAR |S21 experiments. 

Likelihood surfaces from a multi-parameter fit to these CMB data are 
shown in Fig. II II For this fit, 7 parameters were allowed to vary: fi c h 2 , i?f,/i 2 , 
h, t, n s , us j and fi v h 2 . Other cosmological parameters were set at their "model 
simplification" values as discussed in Section 19.11 In particular, we have as- 
sumed a flat cosmological model with Q t ot — 1 and that the tensor contribu- 
tion to the CMB is negligible. In choosing this set of 7 parameters, and using 
the standard MCMC technique we have implicitly assumed uniform priors for 
each. The constraints on the 7 fitted parameters are given in Table |3 



9.4 parameter degeneracies in the CMB data 

By examining Fig. ^] we see that the CMB data alone do not constrain 
all of the fundamental cosmological parameters considered to high precision. 
Degeneracies exist between certain combinations of parameters which lead to 
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Fig. 11. 2D projections of the 7D likelihood surface resulting from a fit to the CMB 
data plotted in Fig. I1UI The shading represents areas with — 2AC = 2.3, 6.0, 9.2 
corresponding to la, 2a and 3cr confidence intervals for multi-parameter Gaussian 
random variables. There are two primary degeneracies - between Q c h 2 and h and 
between n s , r and fli,h 2 , which are discussed further in Section f9. 41 
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Fig. 12. As Fig. [TO] but now showing 3 different models: the dashed line shows the 
best fit model in all panels - the model plotted in Fig. 1101 The solid lines in the 
top-left panel were calculated with h = ±0.1, top-right J? c ±0.1, bottom-left r + 0.3 
and t = 0, and bottom-right n s ± 0.2. 
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Table 2. Summary of cosmological parameter constraints calculated by fitting a 
7-parameter cosmological model to the CMB data plotted in Fig. E3 and to the 
combination of these data with the measurement of the 2dFGRS power spectrum 
|1(J| - see text for details. Data are given with la error, except for f2„h 2 which is 
presented as a la upper limit. 



parameter CMB constraint CMB+2dFGRS constraint 



n c h 2 


0.107 ±0.015 


0.106 ±0.006 


n b h 2 


0.0238 ± 0.0021 


0.0235 ± 0.00166 


h 


0.725 ± 0.096 


0.718 ± 0.036 


T 


< 0.204 ±0.117 


< 0.195 ±0.085 


n a 


1.00 ±0.064 


0.987 ± 0.046 




0.703 ±0.125 


0.696 ± 0.085 


n u h 2 


< 0.00700 


< 0.006 



CMB fluctuation spectra that cannot be distinguished by current data |16j . 
To help to explain how these degeneracies arise, CMB models with different 
cosmological parameters are plotted in Fig. ^| 

Constraining models to be flat does not fully break the geometrical degen- 
eracy present when considering models with varying Q%oti and a degeneracy 
between the dark matter density fi c and the Hubble parameter h remains. 
Fig. ^] shows that both fl c and h affect the location of the first acoustic 
peak. A simple argument can be used to show that models with the same 
value of Sl m h 3A predict the same apparent angle subtended by the light hori- 
zon and therefore the same location for the first acoustic peak in the TT power 
spectrum |44j . The degeneracy in Fig. II II roughly follows this prediction. 

There is another degeneracy that that can be seen in Fig. II II between n 8 , 
t and fibh 2 . From Fig.^J we see that the effect of the optical depth r on the 
shape of the TT power spectrum occurs predominantly at low multipoles. By 
adjusting the tilt of the primordial spectrum (n s ), the low-£ power spectrum 
can be approximately corrected for the change in r, and the high-^ end can be 
adjusted by changing the baryon density. This degeneracy is weakly broken 
by the TE data which provide an additional constraint on r. 

9.5 results from the combination of LSS and CMB data 

The CMB degeneracy between fi c and h can be broken by including additional 
constraints from the power spectrum of galaxy clustering. There have been 
a number of studies using both CMB and large-scale structure data to set 
cosmological constraints, with a seminal paper coming from the WMAP col- 
laboration [^Jj . Recently new small-scale CMB data and large-scale structure 
analyses have increased the accuracy to which the cosmological parameters 
are known. [611152) . 

In Fig. we provide a likelihood plot as in Fig ^] but now including 
the cosmological constraints from the final 2dFGRS power spectrum JU| . For 
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Fig. 13. As Fig [TT1 but now including extra constraints from the 2dFGRS analy- 
sis of |1U|. These constraints helps to break the primary degeneracies discussed in 
Section WM 



this analysis, a constant bias was assumed and we fitted the galaxy power 
spectrum over the range 0.02 < k < 0.15 h Mpc -1 . The derived parameter 
constraints for the 7 parameters varied are compared with the constraints 
from fitting the CMB data only in Table [3 The physical neutrino density 
f} u h 2 is unconstrained within the prior interval (physically, it must be > 0), 
so we only provide an upper limit. 

A Table of parameter constraints, such as that presented in Tabled repre- 
sents the end point of our story. We have introduced the major steps required 
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to utilise a galaxy survey to provide cosmological parameter constraints, and 
have ended up with an example of a set of constraints for a particular model. 
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